76 research outputs found

    Segmental K-Means Learning with Mixture Distribution for HMM Based Handwriting Recognition

    Full text link
    This paper investigates the performance of hidden Markov models (HMMs) for handwriting recognition. The Segmental K-Means algorithm is used for updating the transition and observation probabilities, instead of the Baum-Welch algorithm. Observation probabilities are modelled as multi-variate Gaussian mixture distributions. A deterministic clustering technique is used to estimate the initial parameters of an HMM. Bayesian information criterion (BIC) is used to select the topology of the model. The wavelet transform is used to extract features from a grey-scale image, and avoids binarization of the image.</p

    Clustering with EM: Complex Models vs. Robust Estimation

    Full text link

    Design of Experiments for Screening

    Full text link
    The aim of this paper is to review methods of designing screening experiments, ranging from designs originally developed for physical experiments to those especially tailored to experiments on numerical models. The strengths and weaknesses of the various designs for screening variables in numerical models are discussed. First, classes of factorial designs for experiments to estimate main effects and interactions through a linear statistical model are described, specifically regular and nonregular fractional factorial designs, supersaturated designs and systematic fractional replicate designs. Generic issues of aliasing, bias and cancellation of factorial effects are discussed. Second, group screening experiments are considered including factorial group screening and sequential bifurcation. Third, random sampling plans are discussed including Latin hypercube sampling and sampling plans to estimate elementary effects. Fourth, a variety of modelling methods commonly employed with screening designs are briefly described. Finally, a novel study demonstrates six screening methods on two frequently-used exemplars, and their performances are compared

    Analysis of Incomplete Data from Highly Fractionated Experiments

    No full text
    An iterative method is proposed that provides a simple and flexible way to consider many models simultaneously. The method can be implemented with existing software, results in computational savings and promotes experimenter involvement

    A Critical Look at Accumulation and Related Methods

    No full text
    Using accumulation analysis on ordered categorical data can often result in the detection of spurious effects

    Inference and sequential design

    No full text
    If an experiment is designed sequentially, repeated-sampling inference may not necessarily be made using distributional results that are valid for fixed designs. A few simple illustrative examples and alternative approaches are given. Sometimes the sequential nature of design can be ignored asymptotically. Links are forged with inference for stochastic processes and missing data problems

    The effect of two-stage sampling on the F-statistic

    No full text
    The assumption of iid observations that underlies many statistical procedures is called into question when analyzing complex survey data. The population structure-particularly the existence of clusters in two-stage samples that usually exhibit positive intracluster correlation-invalidates the independence assumption. Kish and Frankel (1974) investigated the impact of this fact on regression analysis by using the standard sample-survey-theory framework; Campbell (1977) and Scott and Holt (1982) used the linear model framework. In general, although ordinary least squares (OLS) procedures are unbiased but not fully efficient for estimation of the regression coefficients, serious difficulties can arise when using OLS estimators for second-order terms. Variances of the OLS estimators for the regression coefficients can be larger (sometimes much larger) than the usual OLS variance expression would indicate. Failure to consider this possibility leads to underestimation of variances, with consequences for confidence intervals. This article follows this effect through to the F statistic, because of its importance to hypothesis tests and confidence ellipsoids. Our major aim is to investigate the effect of intracluster correlation on the F statistic. We propose a diagnostic measure identifying when the ordinary F statistic is likely to be affected and give decomposition in terms of the contributions of the individual regressors and their cross-products, based on a similar decomposition for the projection matrix in Appendix A. We establish numerically and theoretically the effectiveness of this measure in understanding the degree of distortion of F by intracluster correlation. The measure leads to a correction for the F test for unknown intracluster correlation. This is a slightly simpler numerical procedure than the generalized least squares (GLS), since it does not require iteration. The correction is shown to perform at least as well as the GLS in a simulation study

    An Investigation of OA-based Methods for Parameter Design Optimization

    No full text
    There exists simpler alternatives for analyzing the results of a designed experiment than the orthogonal array methods proposed by Taguchi

    The use of a canonical form in the construction of locally optimal designs for non-linear problems

    No full text
    Optimal experimental designs for non-linear problems depend on the values of the underlying unknown parameters in the model. For various reasons there is interest in providing explicit formulae for the optimal designs as a function of the unknown parameters. This paper shows that, for a certain class of generalized linear models, the problem can be reduced to a canonical form. This simplifies the underlying problem and designs are constructed for several contexts with a single variable using geometric and other arguments
    corecore